In this workshop, we will go over how to use R and Python on Hoffman2
We will explore:
This presentation and accompanying materials are available on 🔗 UCLA OARC GitHub Repository
You can view the slides in:
Each file provides detailed instructions and examples on the various topics covered in this workshop.
Note: 🛠️ This presentation was built using Quarto and RStudio.
Hoffman2 supports running 🐍 Python applications.
It is HIGHLY recommended to use a version of python that has been built and tested by Hoffman2 staff.
These versions of python can be accessed by using module load commands. This means that using system python builds (i.e. /usr/bin/python) are NOT recommended
/u/local/apps/python/3.7.3/gcc-4.8.5/bin/python3Once you load the Python module, you can run Python with either an interactive (qrsh) or Batch job (qsub)
Get access to a compute node and load python module
Run Python
Note
If you are using Hoffman2 build python, like most python builds from source, when using Python version 3.x, the command is python3 while version 2.7 is python
You can create a job script to run python as a Batch job
For example, we create a job script, named foopy.job
#!/bin/bash
#$ -cwd
#$ -o joblog.$JOB_ID
#$ -j y
#$ -l h_rt=1:00:00,h_data=10G
#$ -pe shared 1
# load the job environment:
. /u/local/Modules/default/init/modules.sh
module load python/3.7.3
python3 foo.pyThen we will submit this job
Note
See that this job script used one core -pe shared 1.
You can request more cores with modifing -pe, but your Python code MUST also me modified for your Python code to actually use multiple.
See our workshop on Big Data for some ways to parallelize your Python code.
The builds of Python on Hoffman2 ONLY have the basic compiler/interpreter and a few basic packages.
If you would like to use the package, scikit-learn, you can install this package via the pip (PyPI) package manager.
You will notice the --user flag on pip3. Is will ensure that you will install the package in your $HOME directory.
Normally, pip will try to install in the main Python build, in which, users do not have write access and you may see errors when building your package. When using --user by default, it will install it in $HOME/.local.
If you want to install this package in another directory, you can install this package in any directory you have write access.
For example, if you want to install this package in $SCRATCH/python-pkg,
module load python/3.7.3
pip3 install scikit-learn -t $SCRATCH/python-pkg
export PYTHONPATH=$SCRATCH/python-pkg:$PYTHONPATHWhen running your jobs, will will have to make sure that the variable $PYTHONPATH is updated to have any custom locations of your packages.
Warning
Users will need be aware when they are installing multiple packages in $HOME/.local with pip3 --user.
Installing conflicting packages with different versions of python will let to errors.
When working on different projects, it it best to use Virtual Environment or Anaconda to install python packages and dependencies.
One great way to manage your Python packages is using Python’s Virtual Environment feature. This will allow users to install and manage Python inside an environment that they control and is located in their own directories. It is an isolated runtime environment so user can set unique versions and dependencies of applications.
This is good when you are running multiple applications that requires different application environments.
To create a environment, you will first load the base 🐍 Python version on Hoffman2 you want to use. For example, if you want to use python/3.7.3:
Then create a directory location you want to install the env.
This will create an env named mypython3.7.3. Once it is created, you can activate the env by:
This is a bash shell script that will active your new python environment. From here, you can managing packages with pip or and other installtion method.
Note
Since this is a new, custom python env that you created, you don’t need to add --user flags in pip. All new libraries and packages will be installed by default inside the new environment ($HOME/myenv/mypython3.7.3)
More information about using 🐍 Python Virtual Environments can be found here
Tip
For python version 3.x, the commands python and python3 are the same in virtual environments
You can run which python and which python3 to make sure you are running the python in the correct location
Hoffman2 supports running R applications.
You can see all the available version of R on Hoffman2 by running:
Then you can load R
It order to use the correct build of R, they will need to have gcc or intel modules loaded first, based on what was shown by modules_lookup. This will ensure the gcc and intel libraries and the correct version is correctly loaded.
Once you load the R module, you can run R with either an interactive (qrsh) or Batch job (qsub)
Get access to a compute node
Load python module
Run R
You can create a job script to run R as a Batch job
For example, we create a job script, named fooR.job
#!/bin/bash
#$ -cwd
#$ -o joblog.$JOB_ID
#$ -j y
#$ -l h_rt=1:00:00,h_data=10G
#$ -pe shared 1
# load the job environment:
. /u/local/Modules/default/init/modules.sh
module load R/4.2.2
R CMD BATCH fooR.RThen we will submit this job
When installing R packages, a common way is to use
Typically, when you run R, it will install the new packages in the main R global directory. Though, on Hoffman2 (and other 💻 HPC resources), you will not be able to modify this directory.
Example install:
R will prompt you with a new path that will be located in your $HOME directory. This directory path is determined by the $R_LIBS_USER.
Each R module on Hoffman2 has a unquie $R_LIBS_USER to avoid conflicts when using different versions of R.
R may also ask you to select a CRAN mirror. You can choose 1 to use the CRAN https cloud server
Sometimes R can error during the installing process. R can output a lot of output during the install.packages() step.
This output can be very long, though, looking though this output can give you clues to what you need to do to fix it.
Look for lines that look like C/C++ errors, lines that have ‘ERROR, or ’no such file or directory’.
For example, the R package ‘glmnet’ will error when using the R/4.2.2 module with no gcc module, since it will use the default gcc 4.8.5 version that is too old for the ‘glmnet’ package.
The message will say “Error: C++17 standard requested but CXX17 is not defined”. Where you will need a version of gcc that uses at least a C++17 standand.
To fix this, you can use R/4.2.2 with gcc version 10.2.0
When installing packages, you may need to add external software and libraries to successfully install the package.
For example, if you need to install the nloptr package, you will need to have the nlopt libraries already installed and loaded in your shell session. On Hoffman2, we already have this library and you will need to load the module before you try to install the package
If you run
You may see an error line
To fix this, you will load the nlopt module with R
Sometimes you will what to use a different directory location to install your packages. Maybe you are running low in your $HOME directory to you have group space with large storage space. You can update .libPaths() to have a directory location to install and find your R packages
# Assign the current library paths to a variable
current_paths <- .libPaths()
# Append the new directory to the library paths
new_library_path <- "/path/to/new/library"
updated_paths <- c(current_paths, new_library_path)
# Update the library search paths
.libPaths(updated_paths)If you decided to use this new directory to store you R packages, make sure that you have all your R scripted updated to find this directory everytime you run R.
Anaconda is a very popular Python and R distribution. This is a great option for simplifing package management and piplines. Hoffman2 does have Anaconda installed, from which, the user can create their own conda environments.
TO create a environment, you will first need to load the Anaconda module
Warning
There is NO need to load other python or R module. Your anaconda environment will have a build of python and/or R so adding other modules may have conflicts.
Note
There is NO need to run conda init. Anaconda is environment is already load when you run the conda.sh script. When you run conda init, it will modify your .bashrc file and possibly break your created env.
After you load anaconda, you will now create your new environment
This will create a python env, version 3.7, with the name mypython3.7. Then it will activate the new environment. From here you can install any package you want with conda install or pip install. Do NOT user `--user with pip in this case. You want the package to install in the conda env, not in $HOME/.local
By default, conda envs are install in $HOME/.conda directories. You can create env in other directories by
This will install the conda env in a custom location. You can install your conda env in a shared PROJECT directory so they can be shared with other users.
For more information, we had done a workshop on using Anaconda on Hoffman2 that you can review.
Jupyter can be executed on Hoffman2. This process involves running the jupyter notebook/lab command on a Hoffman2 node, forwarding the Jupyter port to your local machine, and accessing Jupyter via your local web browser.
However, for a more streamlined experience, we have created a script, h2jupynb, which automatically creates a Jupyter session on a Hoffman2 compute node.
More information can be found on our website
You can conveniently interact with Hoffman2 using RStudio Server, allowing you to utilize R in a familiar, intuitive environment.
Find detailed instructions for using RStudio on Hoffman2 on our GitHub page
We’ve also conducted a dedicated workshop on using Rstudio on Hoffman2. Feel free to explore this resource for additional insights and usage tips.
Questions? Comments?
Look at for more Hoffman2 workshops at https://idre.ucla.edu/calendar
Fill our assessment form